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The present invention models noise as a time- 
correlated Gaussian random process, parameterized by it's a 
priori Power Spectral Density (PSD) versus frequency, P N (f), 
5 where f is the frequency. The noise spectral amplitude n(f) has 
the distribution function shown in Equation 1. Pu(f) is 
dynamically updated throughout the processing. In the following, 
frequency dependence will be made explicit only as needed. Also, 
consistent with methods technical discussions in this field, the 
10 term xx power" will generally refer to the PSD. 

Equation 1 

f n (n) = 2n/P N Exp(-n 2 /P N ) 

15 The distribution function of speech is modeled as a 

GMM of time-correlated samples, leading to a distribution 
function for the speech spectral amplitude s (f) as shown in 
Equation 2, where S(s) is a one-sided Dirac delta function. The 
first term on the right hand side (RHS) of Equation 2 represents 

20 a signal of zero power, thus capturing the possibility that no 
signal of interest is present. The components of the summation 
in the second term on the RHS of Equation 2 are the components 
of the GMM model for the speech distribution function. 

25 Equation 2 

f s {s) = (1 - q s )S(s) + qsils^^- Exp(-s 2 1 Pi )} 

Pi 

This speech model has two sets of frequency band 
dependent parameters which are dynamically updated during the 
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processing, {P s (f)} and {q s (f)}- The first is the a priori PSD of 
the speech, assuming that a speech signal is present at the 
frequency and time of interest. The second is the a priori 
probability of a speech signal being present at that frequency 
and time. The speech distribution function also has a number of 
added parameters, {a I } = {a 1 ,a 2 , ...a N } and {p±° } = (p 2 ° , p 2 ° , -Pn° } - The 
{a±} are the weights of the N Gaussian components of the GMM, and 
the {p±°} are the powers of each component when the speech PSD is 
normalized to P s (f) =1. In practice, P s (f) and {p±°} are combined 
into a parameter set denoted as (p±(f)}, where p±(f) = p±° P s (f). 

While both the P s (f) and q s (f) are dynamically updated 
during the processing, the {a±} and {p±°} are determined from 
prior "training" to optimize processing results as averaged over 
a representative body of training data. This may typically be 
done by minimizing the mean-squared-error (MSE) between noise 
free signals and the results from processing noisy input signals 
based on those signals by mixing with varying types and levels 
of interfering noise. The present invention may typically use 
five GMM components (denoted GMM5) . However, more or less than 
five components can be employed. In addition, the {a±} may be 
further parameterized by the values of other key quantities, 
including but not limited to signal-to-noise ratio (SNR) , which 
are adaptively and dynamically updated throughout the processing. 
This may typically be done by determining different GMM model 
parameter values (the {a±} and {p°}) versus SNR based on training 
for different input SNRs, and interpolating between these model 
parameter values based on the adaptively estimated input SNR 
during the processing. One prior training of a GMM 5 leads to a 
model for the speech distribution as shown in Figure 1 for g s = 
0.5. Also shown is the corresponding distribution function for 
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a Gaussian speech model with q s = 1 . For presentation purposes, 
the vertical axis is actually the distribution function for 
speech spectral power, which is simply f(s 2 /P s ), and the 
horizontal axis is (s 2 /P s ) . 
5 Noise PSD updating is mainly based on the following. 

Given a priori distribution functions for the noise and speech 
spectral amplitudes, and a new measurement of the noisy signal 
spectral amplitude, r(f), a determination is made as to a best 
a posteriori estimate of the noise spectral power for use in 
10 updating the noise PSD. This can be expressed in Equation 3, 
where <n 2 \ r> is the expected value of the noise spectral power 
given the input, f(r\n) is the input's distribution function 
conditioned on a noise spectral amplitude n, and f r (r) is the a 
priori distribution function for the noisy input measurement. 

15 

Equation 3 

<n 2 \r>= \dn n 2 f(r \ n) /„(«) / f r (r) 

20 Since speech and noise are additive, f(r\n) and f r (r) 

can be expressed as 

Equation 4 

2 2 

2 5 f{r \n) = {\- q s )S{r -n) + 2^rX^/ 0 (— ) Exp(-^^-) 

Pi Pi Pi 

where I Q (x) is the zeroth-order imaginary Bessel function, and 
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/ (r) = 2LExp(-—)[(l -q s ) + qsH.--^— Exp{ ^ )] 

where S±=p±/P N 

5 

This leads to the result 
Equation 6 

(1 - q s )r 2 + qsP^a, — ^(1 + £-{S,<\ + S,)}" 1 ) Exp[{r 2 / P N )(-\)) 

10 <n 2 \r>= — 

(1 " q S ) + *s5>*0 + S^ExpUr 2 /P N )(-^-)] 

1 + 

The form of this noise estimator for a typical GMM5 
speech distribution is graphically depicted in Figures 2a and 2b 
where the noise estimator from the GMM5 model is shown in solid 

15 lines. In these figures, the vertical axis is (<n 2 \ r>/ P N ) 1/2 , and 
the horizontal axis is (r 2 /P N ) 1/2 . The GMM5 results are shown for 
different SNRs at q s = 1/2. Corresponding results are shown in 
dashed lines for a simple Gaussian speech distribution at q s = 1, 
and an extended Gaussian distribution with q s — 1/2. 

20 Figures 2a and 2b show that for high a priori SNR and 

also high instantaneous (r 2 /P N ) 1/2 , all models infer that the 
current noise power is close to the a priori value. Since the 
speech is assumed to be dominant at high a priori SNR, given a 
high input in terms of (r 2 /P N ) 1/2 f the noise power estimate is 

25 allowed to "coast." Conversely, for low SNR and high 
instantaneous (r 2 /P N ) 1/2 , the Gaussian models overestimate the 
noise since they do not anticipate the possibility of occasional 
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strong speech power as the explanation of the high <r y 
Gaussian models also overestimate the noise at low <r ,H> , -re 
so for a simple Gaussian with - 1 . This is because they also 
do not account for a high probability of speech at very low 
5 power, including temporary speech absence. The extended Gaussian 
model with qs - 0.5 has the least error here. Lastly, the 

h=i« also tend to understimate the noise at 

Gaussian models also teuu 

in termediate values of frVW". since (relative to GMM5, they 
expect a higher probability of speech components in this regime. 
0 The probability of a speech signal being present at 

each frequency and time is adaptively estimated and updated 
throughout the processing. Using the above described a poor, 
distribution functions for noise and speech spectral amplitudes, 

qs( r> which is the probability of speech signal presence given a 
15 new measurement of the noisy signal spectral amplitude, can be 

expressed in Equations 7, 8, 9 and 10. where f ( r,S, is the 

measurement's distribution function conditioned on a signal being 

pre sent - 

20 Equation 7 

q s (r) = f(r\S) q s 'M r ) 

The distribution function f(r\S) can be expressed as 

Equation 8 
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f(r\S)=\dsf s °( < s)f(r\s) 



where f s °(s) is the GMM from the second term of f a (s) defined in 
Equation 2 and since speech and noise time samples are additive, 
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Equation 9 

fir I s) = (2r I P N ) Exp{-(r 2 + s 2 ) / P N ) I 0 (2rs I P N ) 

5 

This leads to the result 



Equation 10 

10 

q S (r) = [l + ] -^ L { 2>/ 0 + Si r 1 Expir^z- (r 2 / )) J" 1 

Figure 3 graphically depicts the qs(r) estimator 
15 defined in Equation 10 versus (r 2 /P N ) **, for a typical GMM speech 
distribution model, at various values of SNR, and q s = 1/2. As 
shown, the ability to discriminate speech presence versus absence 
at low values of r 2 /P M also requires very high SNR. Compared to 
a Gaussian speech model, this is due to the higher probability 
20 of lower power speech components, which also is balanced in the 
long-tailed GMM speech model by a higher probability of higher 
power speech components. 

In a manner similar to the previous explanation, the 
speech power versus time and frequency can be estimated using 
25 Equations 11 and 12. Where <s 2 \r> is the a posteriori speech 
power (PSD) estimate given a new measurement of noisy signal 
r(f), the optimal estimator is as shown in these equations. 

Equation 11 

30 

< s 2 I r >= \ds s 2 f(r \s)f s (s) / f r (r ) 
Evaluation of the above leads to the following. 
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Equation 12 

S ^ iS* 

" 1 + 5/ 

The form of this estimator is depicted in Figures 4a 
and 4b. In these figures, the vertical axis is ( <s 2 1 r>/P N ) , and 
the horizontal axis is {r 2 /P N )**. GMM5 results are given for 
different SNRs, a nominal speech distribution function at q s = 
0.5, and as compared with a Gaussian speech model at q s = 1.0, 
and also an extended Gaussian modes at g s = 0.5. GMM5 results 
are in solid lines and Gaussian models are shown as dashed lines. 

In a manner similar to the previous explanation, the 
speech spectral amplitude can also be estimated as follows. 

Equation 13 



< s | f > = 



(1 + 5,)' 



r 2 V 
■Exp[(—)(-2- 
P„ 1 + 5, 



)] 



1 + 5, 



)] 



Note that in the special case with only one GMM component in the 
speech distribution function, and also with q s = 1, the above 
expression reduces to a conventional Wiener filter. 

For a typical set of GMM parameters, and at q s = 0.5, 
and for different SNRs, the form of this estimator is shown in 
Figures 5a and 5b, where it is also compared with a Wiener filter 
at q s = 1.0, and also with an extended Wiener filter based on a 
Gaussian speech model but with q s - 0.5. In the figures, the 
vertical axis is <s \ r>/ (P N ) 1/2 , and the horizontal axis is 
(r 2 /P N ) 1/2 . 



It is further noted that the availability of separate 
estimates for both the speech spectral amplitude <s\r> and the 
speech PSD <s 2 \r> allows the option to avoid explicit evaluation 
of the noise PSD estimator in Equation 6, since the same result 
can also be obtained as follows. 

Equation 14 

2 2 - - 2 

<n \r>=r -2r-<s\r> + <s \r> 

Figure 6 shows a processing chain for one preferred 
embodiment of the method of the invention. The processing chain 
is outlined in terms of processing steps performed in sequence 
for each successive (overlapping) frame of noisy input. These 
steps are further detailed in the following discussion. While 
this figure indicates a final output based on an estimate of the 
information signal spectral amplitude (equivalent to an optimal 
waveform estimator), the option for outputs based on the signal 
PSD also will be apparent, and may be preferred in certain cases. 

In Figure 6, a noisy signal y (t) (601) is received and 
is passed through an analog to digital converter (602) to provide 
a stream of digital samples of the input signal {y±} . A windowing 
function is then applied to produce a frame of input samples, 
which is then frequency analyzed typically by Fourier analysis 
(603) to produce the complex spectral components {r(f)j of the 
noisy signal in that frame. Sampling the outputs from a bank of 
band-pass filters is also an option for performing such time - 
frequency analysis. A preferred frame length is typically 500 
milliseconds, but other frame lengths can be used. Each frame is 
processed in succession. Each frame is chosen to overlap with its 
prior frame by an amount ranging from 50% to as much as 90% . 

At (604) the complex spectral components are converted 
to the PSD P r (f) of the noisy input. At (605) a first estimate 
of the a posteriori PSD of the information signal S] 2 is made 
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using an implementation of Equation 12 with q s = 1 - This 
represents a first estimate of the information signal PSD on the 
condition that a signal is present. At (606) this quantity is 
combined in a weighted combination with the a priori signal PSD 
P s / to stabilize this first estimate against errors. The result 
is denoted as P sl . Then, at (607) a second and typically final 
estimate of the information signal PSD, denoted as P sr is made 
using an implementation of Equation 12 with q s = 1 , now using P sl 
as the a priori value for the information signal PSD. In other 
implementations of the method of the invention either more or 
fewer than two iterations of information signal PSD updating may 
be employed, as well as other variations in the details of the 
procedure. 

At (608) the a priori signal presence probability q s is 
updated, using an implementation of Equation 10, with the updated 
signal PSD. At (609) a filter gain for recovering the spectral 
components of the information signal is estimated using updated 
a priori quantities from previous stages and an implementation 
of Equation 13. In some embodiments of the method this filter 
gain is also smoothed versus frequency and also versus time to 
reduce the tendency for producing sporadic output anomalies known 
in the prior art as "musical noise." In other embodiments the 
gain may be based on the square-root of the updated signal PSD 
multiplied by the updated signal presence probability and divided 
by the noisy signal PSD, or on a weighted combination of this 
gain with the former, and a weighting parameterized by other 
quantities made available through the methods of the invention. 

At (610) the spectral amplitude gain versus frequency 
is multiplied by the corresponding noisy signal input spectral 
components to recover the spectral components of the information 
signal in the frame being processed. At (611) the recovered 
information signal spectral components are converted to time 
samples typically using inverse Fourier analysis techniques, and 
are overlapped and added to corresponding time sample outputs 
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from adjacent overlapping frames using technigues mainly based 
on the prior art. At (612) these time samples are passed through 
a digital-to-analog converter to provide an analog output if such 
is desired, or at (616) the digital time samples are passed to 
a subsequent digital processing stage if such is desired. 

Also, at (613) the noise PSD for the frame being 
analyzed is estimated, typically using an implementation of 
Equation 14, which allows the estimate from Equation 6 to be more 
efficiently done based on the other updated quantities already 
available. Then, at (614) this current frame noise PSD estimate 
is combined with prior-frame noise power estimates in a weighted 
average typically based on exponential time smoothing and 
typically with a time constant in the range of 0.2 - 2.0 seconds, 
which time constant may be adjusted according to requirements of 
the application, and also adaptively adjusted based on quantities 
that are made available from the methods of the invention. 

The block and symbol at (615) and corresponding uses 
of this block and symbol elsewhere in the diagram of Figure 6 
represents the inter-frame time delay that exists between the 
estimation of quantities in a current frame of input data and 
their use as a priori quantities for the next overlapping frame 
of input data. 

While we have illustrated and described one preferred 
embodiment of the present invention, it is understood that this 
invention is not limited to the precise instructions herein 
disclosed, and the right is reserved to all changes and 
modifications coming within the scope of the invention as defined 
in the following appended claims . 



